Semantic Multi-modal Analysis, Structuring, and Visualization for Candid Personal Interaction Videos

نویسنده

  • Alexander Haubold
چکیده

Videos are rich in multimedia content and semantics, which should be used by video browsers to better present the audio-visual information to the viewer. Ubiquitous video players allow for content to be scanned linearly, rarely providing summaries or methods for searching. Through analysis of audio and video tracks, it is possible to extract text transcripts from audio, displayed text from video, and higher-level semantics through speaker identification and scene analysis. External data sources, when available, can be used to cross-reference the video content and impose a structure for organization. Various research tools have addressed video summarization and browsing using one or more of these modalities; however, most of them assume edited videos as input. We focus our research on genres in personal interaction videos and collections of such videos in their unedited form. We present and verify formal models for their structure, and develop methods for their automatic analysis, summarization and indexing. We specify the characteristic semantic components of three related genres of candidly captured videos: formal instructions or lectures, student team project presentations, and discussions. For each genre, we design and validate a separate multi-modal approach to the segmentation and structuring of their content. We develop novel user interfaces to support browsing and searching the multi-modal video information, and introduce the tool in a classroom environment with ≈160 students per semester. UI elements are designed according to the underlying video structure to address video browsing in a structured multi-modal space. These user interfaces include image/video browsers, audio/video segmentation browsers, and text/filtered ASR transcript browsers. Through several user studies, we evaluate and refine our indexing methods, browser interface, and the tools usefulness in the classroom. We propose a core/module methodology to analysis, structure, and visualization of personal interaction videos. Analysis, structure, and visualization techniques in the core are common to all genres. Modular features are characteristic to video genres, and are applied selectively. Structure of interactions in each video is derived from the combination of the resulting audio, visual, and textual features. We expect that the framework can be applied to genres not covered here with the addition or replacement of few characteristic modules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following direct...

متن کامل

An Information-Theoretic Framework towards Large-Scale Video Structuring, Threading, and Retrieval

An Information-Theoretic Framework towards Large-Scale Video Structuring, Threading, and Retrieval Winston H. Hsu Video and image retrieval has been an active and challenging research area due to the explosive growth of online video data, personal video recordings, digital photos, and broadcast news videos. In order to effectively manage and use such enormous multimedia resources, users need to...

متن کامل

Trial Realization of Human-Centered Multimedia Navigation for Video Retrieval

A trial realization of human-centered navigation for video retrieval is presented in this paper. This system consists of the following functions: (i) multi-modal analysis for collaborative use of multimedia data, (ii) preference extraction for the system to adapt to users' individual demands, and (iii) adaptive visualization for users to be guided to their desired contents. By using these funct...

متن کامل

Fusing Biomedical Multi-modal Data for Exploratory Data Analysis

Data analysis in modern biomedical research has to integrate data from different sources, like microarray, clinical and categorical data, so called multi-modal data. The reef SOM, a metaphoric display, is applied and further improved such that it allows the simultaneous display of biomedical multi-modal data for an exploratory analysis. Visualizations of microarray, clinical, and category data ...

متن کامل

Bayesian non-parametrics for multi-modal segmentation

Segmentation is a fundamental and core problem in computer vision research which has applications in many tasks, such as object recognition, content-based image retrieval, and semantic labelling. To partition the data into groups coherent in one or more characteristics such as semantic classes, is often a first step towards understanding the content of data. As information in the real world is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006